A Large Scale, Cross-disease Family Health History Data Set

نویسندگان

  • Hong Yu
  • George Hripcsak
چکیده

Introduction: A family health history data set need to be evaluated before applying to the study of genetic diseases, genetic counseling, and epidemiological studies. We have obtained a large scale, cross-disease family health history data set (FhhDS) from electronic discharge summaries at Columbia Presbyterian Medical Center by using a pattern matching parser we have developed 1. Currently, FhhDS contains 22,292 patients' family health history. Here we have evaluated FhhDS by examining the scale, the coverage, and the completeness of the data set. Background: The Columbia Presbyterian Medical Center (CPMC) has the Columbia Clinical Repository that contains many electronic discharge summaries 2. All electronic discharge summaries are inpatient and obtained by physician dictation. We have developed and evaluated a pattern matching parser-Family Health History Extraction Processor (FhhEP) that extracts family health history from narrative electronic discharge summaries and builds a database-ready output 1. The output is the health history findings containing ten attributes: unique identification number of the discharge summary, family member, number of the family members, disease, trait, presence or absence of the disease, certainty, death condition, date of developing disease or death, and location of the disease. From the unique identification number of the discharge summary, we can retrieve the patient's MRN and discharge date. Five physicians have evaluated the natural language processing parser. The majority physician opinion is the reference standard. Sensitivity is the number of positive answers in common between the parser findings and the reference standard divided by the number of positive answers in the reference standard. Precision is the number of positive answers in common between the parser findings and the reference standard divided by the total number of parser findings. The sensitivities in the presence of all ten attributes, five attributes (family member, number of family member, disease, trait, and presence of disease), four attributes (family member, disease, trait, presence of disease), and three attributes (disease, trait, presence of disease) are 75.6%, 85.3%, 85.7%, and 93.0%, respectively. The precision in the presence of all the attributes, five attributes (family member, number of family member, disease, trait, and presence of disease), four attributes (family member, disease, trait, presence of disease), and three attributes (disease, trait, presence of disease) are 72.0%, 72.3%, 72.3%, and 79.5%, respectively 1. Methods: We obtained the FhhDS by running the FhhEP through all the electronic discharge summaries at CPMC, one year at a time, from 1992 to 1998. We measured the …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prevalence and Risk Factors for Chronic Kidney Disease in Family Relatives of a Cameroonian Population of Hemodialysis Patients: A Cross-Sectional Study

Background: In sub-Saharan Africa (SSA), the trend in the number of patients admitted for maintenance hemodialysis is on the rise. The identification of risk factors for chronic kidney disease (CKD) ensures adequate primary and secondary preventive measures geared at reducing the burden of CKD in low-resource settings. A family history of CKD is an established risk factor for C...

متن کامل

مدل لجیت تجمعی در بررسی مخاطرات اندومتریوز و شدت آن

Background: Endometriosis is one of the prevalent chronic diseases in women that causes infertility and other problems. Since severity of this disease is expressed in ordinal scale, the aim of this study is to analyze risk factors and progress of the disease by ordinal logistic regression and cumulative logit model. Methods: In this cross-sectional study, we studied infertile women that referr...

متن کامل

The Study of Self-Assessed Health among Elderly Women in Shiraz and Yasuj Cities

Introduction: Women are facing inequalities including health, thus their health deserves attention. Investigating people’s health status from their viewpoints is an important measure in terms of public health of society and an indicator to determine the efficiency of health system.  This study aimed at evaluating the self-assessed health of older women living in Shiraz and Yasuj. ...

متن کامل

Knowledge and Attitudes towards Cardiovascular Disease in a Population of North Western Turkey: A Cross- Sectional Survey

Background: Cardiovascular diseases risk factors are preventable in population. Nurses can act as crucial communicators of individuals identified with the noted risk factors. Objective: The aim of the study is to assess the knowledge and attitudes of a population in Turkey towards risks of cardiovascular disease. Methods: A descriptive cross-sectional study was carried out between June and Augu...

متن کامل

Cross-sectional analysis of BioBank Japan clinical data: A large cohort of 200,000 patients with 47 common diseases

BACKGROUND To implement personalized medicine, we established a large-scale patient cohort, BioBank Japan, in 2003. BioBank Japan contains DNA, serum, and clinical information derived from approximately 200,000 patients with 47 diseases. Serum and clinical information were collected annually until 2012. METHODS We analyzed clinical information of participants at enrollment, including age, sex...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000